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ABSTRACT 


Semantic caching is an important technology for improving 
the response time of future user queries specified over re- 
mote servers. This paper deals with the fundamental query 
containment problem in an XQuery-based semantic caching 
system. To our best knowledge, the impact of subtle dif- 
ferences in XQuery semantics caused by different ways of 
specifying variables on query containment has not yet been 
studied. We introduce the concept of variable binding de- 
pendencies for representing the hierarchical element depen- 
dencies preserved by an XQuery. We analyze the problem of 
XQuery containment in the presence of such dependencies. 
We propose a containment mapping technique for nested 
XQuery in presence of variable binding dependencies. The 
implication of the nested block structure on XQuery con- 
tainment is also considered. We mention the performance 
gains achieved by a semantic caching system we build based 
on the proposed technique. 


Categories and Subject Descriptors 


F.3 [Theory of Computation]: Logics and Meanings of 
Programs; I.1.1 [Computing Methodologies]: Expres- 
sions and Their Representation; H.2.8 [Information Sys- 
tems]: Database Applications 


General Terms 


XQuery containment mapping algorithm and theory 


Keywords 


XQuery Containment, variable binding dependency 


1. INTRODUCTION 


Due to its fundamental role in many database applica- 
tions such as query optimization and information integra- 
tion [16], the problem of query containment has received 
considerable attention over the past few decades. With the 
initial focus on relational queries, researchers have recently 
begun to study the containment problem for various frag- 
ments of XPath [13, 1, 22, 12] and XQuery [9, 18, 10]. The 
key technique of containment mapping for relational queries 
[4] has been extended in the new contexts to derive map- 
pings between navigation pattern trees and nested XQuery 
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constructs. It has been commonly recognized that extended 
containment mapping is central for minimizing XML queries 
[23, 9], and for reformulating queries in a mediator system 
[9] or a peer-to-peer environment [18]. 


1.1 Motivation 


This work is motivated by the promising application of 
semantic caching for answering XML queries using cached 
XML views [6, 8]. The idea of semantic caching is that the 
(mobile) client maintains both the semantic descriptions and 
associated answers of previous queries in its cache, in the 
hope of being able to reuse them to speed up the processing 
of subsequent queries. 

An XQuery-based semantic caching system named ACE- 
XQ has been proposed [6, 7] for facilitating XQuery process- 
ing in the Web environment. The main techniques exploited 
by ACE-XQ include the containment mapping approach for 
nested XQuery, XQuery rewriting, and a multi-granularity 
replacement strategy. With [8] focussed on the proposed re- 
placement strategy and the cache performance evaluation, 
we introduce, in this paper, the fundamental query contain- 
ment technique underlining ACE-XQ which is the first com- 
prehensive practical semantic cache solution for handling 
nested conjunctive XQuery. 


1.2 The Related Work 


In the XML setting, extensive research has focussed on the 
query containment problem for regular path expressions on 
general cyclic graph databases [3], tree pattern queries and 
XPath queries over XML data [13, 1, 22, 12]. Especially the 
containment problem for XPath and tree pattern queries has 
attracted a lot of attention recently due to the fundamental 
role they play in many XML query languages. 

Different fragments of XPath have been targeted by differ- 
ent works. A well recognized core XPath fragment includes 
child axis ‘/’, descendant axis “//”, branching “| J’, and 
wildcard ‘*’. It is shown in [13] that query containment for 
this fragment, denoted X Pt/Al D is coNP-complete. If 
any of the three constructs “//”, “[ ]”, and ‘*’ is dropped, 
query containment is PTIME. The essence of their contain- 
ment mapping technique is the polynomial-time tree homo- 
morphism algorithm', which serves as a sufficient but not 
necessary condition for containment of X P/F in gen- 
eral. On the other hand, if tag variables and equality testing 
are allowed, query containment is NP-complete. The com- 
plexity increases to II} with disjunctions added. We refer 


' Tree homomorphism and tree embedding are exchangeable. 


the readers to [1, 22, 12] for discussions of the containment 
complexity results under different XPath fragments. 

However, research on the containment problem for XQuery 
is still in its infancy. Besides using XPath expressions as the 
navigation mechanism, XQuery also employs other query 
constructs such as FLWR expressions and the nesting of 
query blocks. These features make XQuery more expres- 
sive than XPath. On the other hand, they also impose new 
difficulties on the containment problem. Specifically, diffi- 
culties arise since an XQuery cannot simply be represented 
by a navigation tree pattern. Hence containment mapping 
based on tree homomorphism alone is no longer sufficient 
for determining XQuery containment. 

To our best knowledge, the containment of nested XQuery 
has so far been studied only in [9], [18], and [10]. [9] exploits 
XQuery containment for query optimization. It utilizes con- 
tainment mapping for identifying redundant navigation pat- 
terns in a query and later for collapsing them to minimize 
the query. In [18], the containment of nested XQuery is 
researched for the purpose of rewriting queries posted on 
one peer to be answered by another peer. [10] studies the 
complexity of the problem regarding completeness. 

Targeting different goals, these three works exploit differ- 
ent approaches. The containment mapping technique pro- 
posed in [9] essentially extends tree homomorphism between 
navigation patterns with additional requirements for map- 
ping the equality-based where-conditions, groupby.id and 
groupby-_value variables. In [18], two types of mappings, i.e., 
a query-head embedding Eneaa(Qi, Q2) and a query-body em- 
bedding Evoay(Q2,@Q1), are employed as the sufficient condi- 
tions for deriving Qi E Q2 (assuming Qi and Q2 are two 
nested XQueries). Eneaa embeds the block structure of Qi 
into that of Q2 while Esoay embeds the navigation pattern of 
Q2 into that of Q1. In [10], containment of nested XQuery is 
defined based on XML instance containment. The theoreti- 
cal complexity result for methods that ensure completeness 
is established. 

Among these three works, [10] presents an approach that 
guarantees completeness (i.e., no false negative answers). 
In answering-queries-using-views scenarios, it is commonly 
considered more crucial to guarantee the soundness while 
the completeness is often ignored to avoid the high com- 
plexity. For example, the containment of nested XQuery in 
general is coNEXPTIME when ensuring completeness [10]. 

In contrast, [9] and [18] attempt to provide more practi- 
cal containment mapping techniques by extending tree ho- 
momorphism with additional mapping conditions. In [9], a 
technique is proposed for identifying redundant navigation 
within one query. It considers the mapping of equality-based 
where-conditions and that of variables distinguished by the 
set or bag semantics they each represent. However, all re- 
turn expressions are considered as black-box functions and 
ignored in the containment mapping process. This contain- 
ment mapping technique is hence a not suitable foundation 
for determining the containment relationship between two 
queries. This is obvious considering the fact that whether 
the direct bindings of variable v or subelements obtained 
from further navigation of v’s bindings are returned does 
make a major difference in the query result. 

Furthermore, neither of the two techniques considers the 
effect of dependencies among variable bindings on the query 
result and consequently on the containment result. In Sec- 
tion 1.3, we give examples of subtle differences in XQuery 


semantics caused by different dependencies among the spec- 
ified variables. Since these two techniques have failed to 
address the critical effect of such differences on the query 
containment result, we propose our containment mapping 
approach which provides sufficient mapping conditions for 
correctly deriving the containment decision. 


1.3 Problem Analysis 


In this work, we target the containment problem for nested 
XQuery. We consider a core fragment of XQuery that allows 
nested blocks, conjunctive equality-based conditions, set and 
bag semantics. Disjunctions, negations, universal quantifier 
and tag variables are not considered. This XQuery fragment 
is the same as that being studied in [9]. [18] and [10] study 
a subset of this fragment as they exclude the bag semantics. 


for $b in document(..)//book, 
for $t in document(**bib.xml")//book/title, $t in $b/title, $a in $b/author 
$a in document(*bib.xml")//book/author || where some $p in $b/price 
return <pairQ1> $t, $a satisfies $p=30 
</pairQ1> return <pairQ2> $t,$a/last 
Qi </pairQ2> Q2 


for $b in document(..)//book 
return <pairQ3> 
{for $t in $b/title, $a in $b/author 
return $t, $a} 
</pairQ3> Q3 


for $b in document(..)//book 
return <pairQ4> 
{for $t in $b/title return $t}, 
{for $a in $b/author return $a} 
</pairQ4> Q4 


Figure 1: Example Queries 


Now let us consider the example queries in Figure 1. All 
four queries Q; (i=1..4) specify $t and $a and return their 
bindings in the results. Suppose the input document bib. xml 
is shown at the left top corner in Figure 2, we can see that 
their results Ra; (i=1..4) (also shown in Figure 2) are all dif- 
ferent due to the subtle differences in their variable specifica- 
tions and nested block structures. Suppose that the DTD for 
bib.aml specifies <! Element book(title, authorx, publisher?, 
price?)>. Rgi contains six title and author pairs derived 
from all combinations of the $t and $a bindings document- 
wide regardless of whether the paired title and author ele- 
ments belong to the same book. In contrast, the $t and $a 
bindings in Q2 are specified based on $b. Therefore, the title 
and author elements corresponding to different book parents 
do not appear in the same pair in Raz. For example, t2 is 
paired with al and a2 but not with a3 in Rao. 


bib 


pul/ tl al tl a2 2 al a2 03 al f3 a2 
ere Rat 


esults results 
PaO pairo? pair@3 paifQ3 \pairQ3 
ple ap? 1 b2 b ji 


titleAythortidà author 


tité aythor^\guthor tit 


t2 al t2 a2 2 al a a 2 al a2 B 
Raz Ras Ras 


Figure 2: bib.xml and Example Query Results 


The differences in the structure of Rai versus that of Re2 
can be intuitively explained by the differences in specifying 


variable dependencies in Qı and Q2. That is, the variables $t 
in Q2 (denoted by $tg2) and $aqgz are defined based on $bg2, 
while $tg1 and $agi are based on $r (i.e., the default root 
variable bound to the root element of document(“bib.cml”)). 

We first explain the effect of variable dependencies on the 
resulting query result for Q2. When constructing the result 
Re2, since $tg2 and $ag2 are defined in the same query 
block, the corresponding new element (pair@2) is produced 
for each tuple in the cartesian product of the bindings of 
$ta2 and $ag2. Due to the way how $tg2 and $aq2 are spec- 
ified, the bindings of $tg2 and $agz2 derived from the same 
binding of $bg2 preserve the sibling (title)—(author) element 
associations under the same parent book element. Such hier- 
archical data dependencies in the source XML are preserved 
in the intermediate variable bindings based on which the 
query result is constructed. In this case, each pair@2 el- 
ement in Rg2 combines bindings of $tg2 and bindings of 
$ag2 only if they share the same parent binding of $bg2. In 
contrast, the sibling (title)— (author) associations are not 
kept in the bindings of $tgi and $aqi. Qı hence produces 
(pairQ1) elements based on the cartesian product of all the 
bindings of $t and $a regardless of their respective parent 
book elements. Q2 hence preserves a finer hierarchy of ele- 
ment dependencies among its intermediate variable bindings 
than Qı does. 

We now analyze the effect of such preserved dependencies 
on the containment result. Suppose that the containment 
mapping technique proposed in [18] is applied to Qi and 
Q2 in Figure 2. Q2 C Qı would be derived since both 
Eneaa(Q2,Q1) and Eboay(Q1, Q2) can be established as il- 
lustrated in Figure 37. To derive Q2 E Qı, this approach 
utilizes not only the navigation pattern based mapping rep- 
resented by Froay, but also Eneaa for checking if the variables 
returned by Q2 are a subset of those returned by Qı. How- 
ever, whether such dependencies among variable bindings 
influence the query containment result has not been studied 
in either [18] or [9]. 


St 3 
‘| SpairQu> {| <pairQ2> 
oist) enes SPT 

{$a} @ k {$a} 0 


Figure 3: Illustration of Containment Mapping via 
Enead and ody 


Assume Qə is answered using Rai based on the contain- 
ment result Q2 CE Qi. Then there is no way to re-group 
the returned bindings of $t and $a in Rai by their respec- 


2 As explained more in detail in [18], Eneaa(Q2, Q1) embeds 
the nested block structure of Q2 into that of Q2. The dashed 
arrows denote the mappings between blocks within which 
the corresponding returned variables match. Eyody(Q1, Q2) 
embeds the navigation patterns (denoted by the bold tree 
edges) specified in Q; into those in Q2. Epoay(Q2,Q1) but 
not Fnead(Qi, Q2) can be established. Hence Qi Z Q2. 
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tive book parent elements as required by Q2. Ignoring this 
requirement, the produced result of Q2 would contain su- 
perfluous pairs, namely, t1—al, t1—a2, t3—al and t3—a2. 


1.4 Our Contributions 


First, we address the problem of producing superfluous 
answers based on the query containment result when ig- 
noring the effect of variable binding dependencies in the 
containment mapping process. Correspondingly we iden- 
tify some important concepts and their connections, as il- 
lustrated in Figure 4, to this problem. 


element dependecies . 
(formally HMVDs) XQuery Containment 
preserved|in concerns 
variable binding dep. query result structure 
(VarTree) (TagTree) 
reduced to utilizes 


essential variable binding dep. 
(minimal VarTree) 


Figure 4: Connection between Preserving of Ele- 
ment Dependencies and XQuery Containment 


The left hand side flow illustrates the preservation of data 
dependencies in the source XML in the intermediate bind- 
ings via the specification of variables. The right hand side 
flow represents the fact that XQuery containment needs to 
take the query result structure constructed based on the 
binding dependencies into consideration. Terms enclosed in 
the parentheses in Figure 4 will be introduced in Section 2. 

Also, we realize that not all the intermediate binding de- 
pendencies preserved by a query are necessarily utilized in 
constructing the final result. Thus we call a subset of vari- 
able binding dependencies being utilized the essential ones 
via which both flows are connected. 

Second, based on our problem analysis, we propose a con- 
tainment mapping technique that considers the containment 
of the utilized binding dependencies in the query result. For 
this, we first decompose the input query and represent the 
two parts of its semantics, i.e., variable binding and result 
construction, by respective tree structures. Then we identify 
the binding dependencies that are preserved by the former 
and utilized by the latter. We call it variable minimization. 
Next we propose to employ three types of containment map- 
pings for deriving the containment decision. 

In sum, we will show that our containment mapping ap- 
proach is more comprehensive than the prior works [9, 18, 
10] in that it deals with the effect of variable binding depen- 
dencies on the query containment result. In other words, it 
avoids deriving the query containment decision which may 
lead to producing superfluous answers for the contained query 
by using the result of the containing query. Like [9] and [18], 
our approach does not necessarily ensure completeness. 


1.5 Paper Outline 


The rest of the paper is organized as follows. In Section 2, 
we define the problem of XQuery containment in the pres- 
ence of variable binding dependencies. Section 3 gives the 
overview of our approach. We describe the pre-step of query 
decomposition and minimization in Section 4. This is fol- 
lowed by our containment mapping technique in Section 5. 
We show the query performance gains achieved by apply- 


ing the proposed technique in a semantic caching system in 
Section 6 and conclude in Section 7. 


2. PROBLEM DEFINITION 


In this section, we first introduce the notion of hierarchical 
multivalued dependencies (HMVDs) which represent a typi- 
cal type of data dependencies in the source XML. Also, we 
define variable binding dependencies as the HMVDs being 
preserved by the query in the intermediate variable bindings. 
We then define the problem of nested XQuery containment 
in the presence of variable binding dependencies. 


2.1 Hierarchical Multivalued Dependencies 


It has been recently recognized that studying the exten- 
sion of the traditional integrity constraints in the XML set- 
ting is both theoretically and practically meaningful. Sev- 
eral classes of integrity constraints including key constraints, 
path constraints, functional constraints, and inclusion con- 
straints have been defined for XML [11]. The more advanced 
constraints such as the multivalued dependencies (aka tuple 
generating dependencies) have also been studied in [19, 2] 
with the goal to develop a normalization theory for XML 
and in [15] for mapping XML DTDs to relational schemas. 

XPath containment in the presence of DTD constraints 
such as sibling constraints and functional constraints has 
been investigated in [11]. The semantics of an XPath query 
can be captured by a unary pattern tree in which only one 
node has its bindings returned as the result while others are 
matched but not returned. However in XQuery, even a single 
for clause may specify multiple variables which correspond 
to an n-ary (n>1) pattern tree. This is where the challenges 
arise for XQuery containment. 

Let us first analyze the semantics of a single-block XQuery 
for the sake of simplicity. In a single-block XQuery that 
utilizes a FLWR expression, the return clause is invoked 
for all the cartesian product combinations of the variable 
bindings produced by the for clause. These combinations 
are determined based on how variables are defined based 
on others. As far as we know, no research has studied the 
implication of such dependencies on XQuery containment. 
This is the task of our work. 


DEFINITION 2.1. Given a DTD, suppose £ is a set of bi- 
nary edge relations between element type e and its children 
element types, each labeled with the corresponding cardinality 
relationship 1, ?, * or +. For any two descendant element 
types x and y of e, if either x or y has a multiple cardinality 
relationship (i.e., * or +) with e, then we call the depen- 
dency among their corresponding elements in a conforming 
XML a hierarchical multivalued dependency (HMVD), 
denoted e >> aly. 


Recall that the notion of multivalued dependency (MVD) 
in relational databases defines that if a relation has two or 
more multivalued independent attributes (e.g., x and y), ev- 
ery value of one attribute (e.g., x) must be repeated with 
every value of the other attribute (e.g., y). HMVD extends 
MVD in the sense that e, x and y are not attributes in a re- 
lation but element types in a DTD. If e, x and y are mapped 
to a 3-column relation and their bindings are unnested, then 
in each partition with an e binding, every x binding needs 
to be repeated with every y binding. 

31, ?, * and + respectively represent the 1-1, 1-(0,1), 1-(0,m) 
and 1-(1,m) (m >1) mappings. 
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2.2 Variable Binding Dependency 


For an XML document D, the dependencies among its 
elements which have multiple cardinality relationships with 
their respective parents can be represented by HMVDs. A 
query imposed against D specifies a subset of HMVDs (di- 
rect or derived) to be preserved by its variable bindings. 


DEFINITION 2.2. Suppose a given query defines variable 
vj based on vi, e.g., for vj in vi(/|//)p;, where p; is the rel- 
ative XPath expression used for deriving v;’s bindings from 
each binding of vi. We call this dependency of v;’s bindings 
on their respective v; bindings a variable binding depen- 


p 
dency, denoted vi > Vj. 


For example, $b Oss $t and $b S Sa hold for Qz~4 
in Figure 2. They all specify their corresponding $t and $a 
based on $b. 

The variable binding dependency relationship satisfies: 

Pj 


Pk Pjk ET 
vi D vj, vj > Uk => vi >x vk (transitivity rule), 


where pjk is the path expression obtained by concatenat- 
ing p; and pp, and >* denotes an induced variable binding 
dependency. Given an XQuery, the direct variable binding 
dependencies extracted from it compose a base dependency 
set from which the non-direct dependencies can be derived 
inductively. 

For example, Q2, Q3 and Q4 in Figure 2 define $b via 
an absolute path expression //book from the root of the 
source XML “bib.2ml”. Suppose variable $r is used as a 
default root variable to be bound with the root element, 


//book /title /author 
then $r œ> $b. Also since $b œ> $t and $b $a, 
//book/title //book/author 
we can derive $r Dx $t and $r Dx $a. In 


contrast, Qı in Figure 2 directly defines $t and $a based 


g . //book/title 
on the root variable $r. Hence it has $r > $t and 


//book/author 
r > a. 


2.3 HMVD and XQuery Containment 


To tackle XQuery containment in the presence of vari- 
able binding dependencies, we cannot solely utilize tree ho- 
momorphism between the two respective navigation pattern 
trees. Additional conditions need to be asserted in the con- 
tainment mapping process to deal with the effect of variable 
binding dependencies on the query semantics. 

Let us consider the containment relationship between Q1 
and Q2 in Figure 2 again. A tree embedding of the pat- 
tern tree of Qı into that of Qe exists, as illustrated by 
Evoay(Q1, Q2) in Figure 3. As described before, if we were to 
use Rai (see Figure 2) to answer Q2 according to Q2 E Qu, 
then it will result in the superfluous answer pairs tl—al, 
tl—a2, t3-al and t3—a2. With the new concepts introduced 
in this section, we can see that this is because the HMVD 
$b —> $t|$a is required by Q2 but not preserved by Q1. 

Suppose that Vars(Q) and Rets(Q) are the defined vari- 
ables and the returned expressions in a query Q respectively. 
All the variables occurring in Rets(Q) must be defined in 
Vars(Q) for Q to be safe. On the other hand, variables 
occurring in Rets(Q) may be a subset of Vars(Q). That 
is, not all the variable binding dependencies are utilized in 
the query result. To determine query containment, we need 
to reason about not only the containment of the returned 


bindings due to Rets(Q), but also the containment of the 
utilized variable binding dependencies due to both Vars(Q) 
and Rets(Q). Correspondingly, we now define XQuery con- 
tainment in the presence of variable binding dependencies. 


DEFINITION 2.3. Let Qi, Q2 be two XQueries. Qı 
Q2 if and only if: 1) there exists a containment mapping 
from Ret(Q1) to Ret(Q2), and 2) the HMVDs preserved in 
Vars(Q1) and utilized by Ret(Q1) are subsumed by those 
preserved in Vars(Q2) and utilized by Ret(Q2). 


For example, the HMVD $b —> $t|$a is reflected in Rg2 
in Figure 2 but not in Rai. That is, the bindings of $tqi 
and $aqi are paired document-wise in Rai, whereas those of 
$tg2 and $agz2 are grouped by their common book elements 
in Rg2. The former pairs can be derived from the latter by 
pairing all the $tg1 bindings with $aqgi bindings regardless 
of if they came from the same book parents. However, there 
is no way to recover the dependencies of $tgi and $aqi 
bindings on their common book elements as required by Q2. 


3. OVERVIEW OF OUR APPROACH 


The main idea of our XQuery containment approach is to 
incorporate the checking of the containment of the utilized 
HMVDs in addition to the checking of the pattern tree ho- 
momorphism (i.e., the embedding of the containing query 
pattern tree into that of the contained query). The main 
steps of our approach are depicted in Figure 5. 


XQueries 


XQuery Normalization & Decomposition 


view query 


Variable Minimization 
TagTree 


$ 


Containment Mapping 


mappings 
> 


XQuery Rewriting 


Figure 5: Containment Checking Flow 


e XQuery decomposition. We separate the variable defini- 
tion part from the result construction part and represent 
each using a tree structure. The former tree (i.e., VarTree) 
captures all the preserved HMVDs. It is different from the 
navigation pattern tree used in [23], as will be explained 
later. The latter tree (i.e., TagTree) is used to represent 
the result construction template. The TagTree also indi- 
cates how the preserved HMV Ds are utilized in the result 
construction. 

e Variable minimization. We identify the variables that are 
neither directly nor indirectly utilized in the result construc- 
tion and degrade them to navigation steps. This way, we de- 
rive a minimal set of variable binding dependencies for which 
we conduct the containment checking. This is a critical step 
for ensuring the correctness of the containment result. 

e Containment mapping. We conduct three types of con- 
tainment mappings. First, we perform the minimal VarTree 
embedding to check the containment of the utilized HMVDs. 
Second, we check the tree embedding relationship between 
the navigation patterns. Lastly, we apply a mapping that 
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deals with the effects of block-structure-induced variable de- 
pendencies on the containment of XQuery. 

e XQuery rewriting. If the new query Q1 is contained within 
a cached query Q2, then the mapping Me established in the 
containment mapping phase can be used for rewriting Qi 
against the query result structure of Q2. The basic idea 
is to substitute each path expression p in Qı for its corre- 
sponding path expression p’ in the TagTree of Q2 based on 
p' = M-(p)oMtz, where Ms represents the mapping of path 
expressions from the VarTree of Q2 to its TagTree. Namely, 
p’ is computed by the composition of Me and M+. We skip 
the details of query rewriting in this paper. 


4. DECOMPOSITION AND MINIMIZATION 
4.1 XQuery Decomposition 


The purpose of query decomposition is to separate the 
semantics of variable bindings from that of result construc- 
tion. However, the semantic distinction is sometimes not 
very easily extracted from the surface syntax. For example, 
not necessarily all expressions in return clauses represent the 
return construction semantics. Due to the flexibility in com- 
posing a nested XQuery, FLWR expressions may be nested 
within a for clause, e.g., for v2 in (for vı in e1 return e2) 
return e3. In this case, e2 in the nested return clause does 
not result in returning its bindings in the ultimate query 
result but only serves for specifying v2’s binding. 

Therefore, we need to first normalize the query to derive a 
form based on which this semantic distinction is made easy. 
Then we represent the two semantics respectively using two 
tree structures, which are connected via variable bindings. 


4.1.1 The Normalization Rules in Use 


Our goal is that the normalized query can facilitate the 
separation of the path expressions that are to be output in 
the result from those that are used for specifying variable 
bindings, such that the later query decomposition step is 
made easy. There are a number of XQuery normalization 
techniques [21, 17, 9] available. They overlap in some com- 
monly used normalization rules. For example, unnesting the 
FLWR expression within a for clause (as illustrated before) 
is a standard rule shared by many techniques. 

We adopt a set of query normalization rules including 
rules (R2)~(R5), (R7)~(R10), and (RG1) from [9]. We 
also apply rules (R1), (R6), (R11), and (R12), but in their 
reverse directions. Rule (R13) does not apply in our con- 
text since we exclude disjunctions from our XQuery frag- 
ment. Since we consider the XQuery fragment with no ag- 
gregations, we can also apply the rule that substitutes each 
let-variable with its definition. After applying these rules, 
the query is free of let clauses, empty sequence expressions 
and unit expressions. Also, only return clauses may contain 
nested FWR? expressions. 


4.1.2 Decomposition into VarTree and TagTree 


DEFINITION 4.1. Given a normalized XQuery Q, a tree 
structure named VarTree=(V, E, L) can be constructed 
based on the extracted variable binding dependencies. Each 
defined variable is denoted by a var node v € V. Each 


“Letter L for representing let is removed since the normal- 
ized query is let-clause free. 


D 
dependency vi > vj corresponds to an edge e= (vi, vj) € E 
labeled p; E€ L. We refer to e the derivation edge of vi. 


The VarTree is different from the pattern tree concept re- 
ferred to in other research [23]. An edge in the pattern tree 
corresponds to an axis step (/ or //) and the associated el- 
ement type test. In contrast, a derivation edge in VarTree 
denotes the navigation pattern used for deriving a child vari- 
able from its parent. Actually this is indicated by the label 
on a derivation edge which is an XPath expression composed 
of possibly multiple steps and branches. In this sense, the 
VarTree can be considered as a nested tree with each edge 
encapsulating the navigation pattern corresponding to the 
label on it. 


DEFINITION 4.2. For a normalized XQuery Q, a tree struc- 
ture conforming to its nested block structure can be con- 
structed to represent the result construction semantics. Tt 
is called TagTree=(N,A). Each block node n € N is a 
quadruple [V,C, R,T] and each edge a=(ni,nj)€ A denotes 
that block nj is nested within block ni. Furthermore, 


e V, C, R, and T respectively represent the variables, 


where-conditions, return expressions, and to-be-constructed 


new elements specified in the corresponding block; 


e C is denoted by a forest of constraint pattern trees each 
rooted at a variable defined in the local or an ancestor 
block. Equality conditions are associated with the cor- 
responding node(s); 

e If unnesting of the bindings of variables in V results in 
a non-empty set and conditions C are satisfied, then 
the construction of a new element denoted by T will be 
invoked for each tuple in that unnested binding set; 

e T may have either none, one, or a sequence of tag 
names in the form (t1)(t2)...(tn). This means that 
the returns of R will be enclosed by an empty tag, (t1) 
and (/t1), or (t1) (#2)... (tn) and (/tn)... (/t2)(/t1). 


We now extend the VarTree structure with a few more 
features. Given the TagTree TTg of a query Q, we get each 
return expression v/pm in a R of TTo and correspondingly 
attach to the var node for v in the VarTree VTg a leaf node 
(also referred to as ret node). Each ret node represents the 
corresponding return expression. To distinguish var nodes 
from ret nodes, we use solid circles to denote the former and 
use hollow circles for the latter. 

The second extension is to shift the constraints and con- 
ditions in the C’s of TTo to be represented in VTg. Specif- 
ically, if the constraint pattern represented by the XPath 
expressions (with or without variables defined in their re- 
spective where clauses? ) is derived from v, or the equality- 
based conditions are affiliated to where-variables that are 
dependent of v, then we move them in the filter “[ ]” of the 
XPath expression p which labels the derivation edge of v in 
VTgQ. Intuitively, this can be done because these constraints 
and conditions are, in a sense, analogous to the relational 
selection operations. They hence can be pushed to be exe- 
cuted in the navigation pattern matching stage for deriving 
variable bindings. 


5 Where-variables refer to variables defined in where clauses, 
while for-variables are those defined in for clauses. Un- 
less indicated otherwise, “variable” means a for-variable. A 
where-variable can be removed since its life scope is refrained 
within the local where clause. 
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For example, the extended VarTrees and TagTrees of ex- 
ample Qı and Q2 are depicted in Figure 6 respectively. 
Note that the where-condition “some $p in $b/price satisfies 
$p=30” in the C of the bottom block in TTg2 is serialized 
into “price=80” and then moved in “[ |” as the filter expres- 
sion for defining $b in VTq2. 


TTo: 


-$r };;;<results>) 


TTo: 
-@{ $r};3;<results>) 


Figure 6: VarTrees and TagTrees of Qı and Q2 


However, we must carry out this VarTree extending pro- 
cess with caution. That is, the shifting of return expressions 
in R and where-conditions in C would not change the query 
semantics only if the to-be-attached var node v is defined in 
the same block where R and C are specified. Some return 
expressions in R and where-conditions in C refer to vari- 
ables that are defined in ancestor blocks. By moving them 
up along the nested block hierarchy to be attached to their 
referring variables, more or fewer bindings than desired may 
be returned. For example, suppose the example query Q3 
in Figure 1 also specifies the where-condition “some $p in 
$b/price satisfies $p=30”, however in the inner block. Then 
attaching “/price=30]” to the definition expression of $b in 
the outer block may cause generating fewer <pairQ3> ele- 
ments due to the push-up condition. We hence leave such re- 
turn expressions and where-conditions in their original block 
nodes in TT. 

The VarTree with these extensions is comprehensive enough 
to also represent the to-be-returned bindings and the effect 
of where-conditions on variable bindings. It is also notewor- 
thy that the VarTree and TagTree of a query are connected 
via variables. In particular, all variables in V’s and those 
referred to in R’s in the TagTree must be present as var 
nodes in the VarTree for the query to be safe. 


4.2 Use-based Variable Minimization 


We explained earlier that the VarTree of a query is a 
nested tree with navigation patterns encapsulated in its deriva- 
tion edges. On one hand, the query semantics stays the same 
if we fully expand the VarTree by unnesting all the encap- 
sulated navigation patterns and by naming each node in 
them with a variable. On the other hand, it is also possible 
not to affect the query semantics by degrading some vari- 
ables into navigation pattern nodes to be encapsulated in 
derivation edges. We call the latter a variable minimization 
process since the number of var nodes is reduced (however 
with more complex navigation patterns encapsulated) and 
the VarTree structure seems minimized. A variable can be 
minimized without affecting the query semantics only if it 
does not participate in preserving any HMVD that is uti- 


lized in the result construction, nor serve in any way as a 
constraint context (will be explained later) for the return 
expressions. 

Our goal here is to explore the opportunities for variable 
minimization to obtain the minimal VarTree (i.e., no further 
minimization is possible). This is critical since the later 
containment checking of utilized HMVDs can be based on 
the derived minimal VarTrees of two given queries. 


DEFINITION 4.3. Given an XQuery Q, suppose D is the 
source XML and v is a variable defined in Q. If by substi- 
tuting all occurrences of v with v’s definition, Q’s result will 
not change for any XML data instance that conforms to the 
same DTD as D, then we say v is nonessential. Otherwise 
v is essential. 


Now we provide practical criteria for distinguishing essen- 
tial variables from non-essential ones based on their uses. 


Explicit vs. Implicit Uses. A variable v may either be 
used for defining another variable or in a return expression. 
We call the former case a Var use of v and the latter a Ret 
use of v. Both are referred to as explicit uses of v in general, 
regardless of where it is used (i.e., either in the local block 
where v is defined or in descendant blocks). 

Besides explicit uses, v may also be implicitly used as a 
“loop counter” for invoking returns. For example, when the 
block where v is defined encloses return expressions referring 
to other variables than v, then the cardinality of v’s bindings 
is used to determine the number of times that the returns 
are to be invoked. In the extreme case when the binding set 
is empty (i.e., cardinality is 0), no return will be invoked. In 
this sense, v serves as the constraint context for the returns. 

If a variable v has neither explicit nor implicit uses, we 
call it has no-use. Such variables are definitely nonessential 
and can be minimized. Otherwise, the essentiality of v de- 
pends on the combination of different uses and the number 
of variable use occurrences. 


One vs. Multiple Uses. Basically, v is essential if it 
has at least two explicit uses, being either Var or Ret uses, 
or a Ret use and an implicit use. The detailed case studies 
and rationale are depicted in Figure 7. 


Essential Variable Identification Procedure 


if v has no explicit use 
l if v has no implicit use either 


then v is nonessential 
else v is essential 
else if v has more than one explicit use (Var or Ret) case 2: multiple uses 
then v is essential 
its bindings and those of its dependent variablesor return expressions. 
else (i.e., exactly one explicit use) 
if v has no implicit use 
then v is nonessential # since no two variables or return expressions have co- 
dependencies with v, minimizing v causes no lost of HMVDs or condition changes. 


case 1: no-use 
# since removing v would cause the lost of ‘loop counter". 


# since it is necessary for preserving the HMVDs among 


case 3: one use 


else 


if v has one Ret use case 4: one ret with implicit uses 


# since removing v would cause lost of “loop counter". 
else (i.e., v has one Var use) 


v is nonessential 
but not u due to their common life scope, and v occurs only in u's definition. 


case 5: one var with implicit uses 


# since no return expression will be affected by only v 


then v is essential 
Figure 7: Identifying Variable Essentiality 


LEMMA 4.1. All essential variables can be correctly iden- 
tified by our analysis in Figure 7 based on variable uses. 
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EXAMPLE 4.1. We use Qa in Figure 2 to illustrate the 
minimization process. Before minimization (as shown in 
Figure 8), the var node for $tga (denoted by the solid cir- 
cles) in VTo4 and that for $ag4 each have one dependent 
ret node (denoted by the hollow circles). Hence $to4 and 
$aga each have one Ret use. Also, the original TTga re- 
veals that $tga and $aga each are specified alone in a bot- 
tom block. Thus they have no chance to affect any other 
return. This means that $tga and $aga each have no im- 
plicit use. Therefore, the var nodes for $tga and $aga in 
VTo4 can be minimized according to the analysis of case 3 
in Figure 7. Correspondingly, the XPath expressions on the 
ret nodes are changed to $b/title and $b/author respectively 
by substituting the variable occurrences by their definitions. 


TTo4 


{$r};;;<results> 


{$b};;;<pairQ4> 


{$a};;{$a};<> 


TTo4 


{$r};;;<results> 


//book 


$b 
/title puttin 


Figure 8: Minimization Example 


{$b};;{$b/title,$b/author};<pairQ4> 


5. XQUERY CONTAINMENT MAPPING 


In this section, we present our containment mapping tech- 
nique which is composed of three types of mappings. The 
first two mappings are based on the obtained minimal VarTrees, 
while the last one is based on the TagTrees. 


5.1 VarTree-based Containment Mapping 


Given two queries Qı and Q2, the first mapping is to check 
the containment of the utilized HMVDs in two queries by 
conducting tree homomorphism (i.e., tree embedding) be- 
tween their VarTrees. Suppose the embedding is from VTQ1 
to VTg2. Then the second mapping is to make sure that 
the navigation pattern used for deriving each var node in 
VTq1 implies a more restricted constraint than that for the 
matched var node in VTg2. These two mappings are called 
MAC mapping and MIC mapping respectively, indicating 
that the former is conducted at the macroscopic level of the 
VarTree (i.e., mapping of var nodes) while the latter is per- 
formed at the microscopic level of the VarTree (i.e., mapping 
of navigation patterns encapsulated in derivation edges). 


5.1.1 MAC Mapping 


We now extend the traditional tree homomorphism (namely 
based on root, label, and ancestor-descendant relationship 
preserving) to define the MAC mapping. 


DEFINITION 5.1. Suppose VT, and VT> are the minimal 
VarTrees of Qı and Q2 respectively. For determining QiC 
Q2, there must be a MAC mapping from VT, to VT2, de- 
noted by ®(VT;,) = VT2, such that the following conditions 
are satisfied: 


C1) roots(VT1) C roots(VT2), 

C2) for any node uE VT, there is a match ®(u) €VT2 
such that T(u)=T(®(u)) if (u) is a var node, and 
T(u) <:T(®(u)) if P(u) is a ret node (T returns 
the type of the element, and <: denotes the subtype- 
supertype relationship), 


C3) u is an ancestor of v for all u,v E€ VT; if and only 
if ®(u) is an ancestor of ®(v) in VT2, and 

C4) if u is a var node in VT, then ®(u) is either a var 
or a ret node; if u is a ret node, then (u) must be 
a ret node. 


Below we explain each of these required conditions. 


C1: Root inclusion®. This condition requires that 
each source XML document referred to in Qı must also 
be referred to in Q2. Correspondingly in the VarTrees, 
roots(VT,) returns the URLs of the source XML documents 
involved in Qı, which should be a subset of those returned 


by roots(VT2). 


C2: Mapping of element types. This condition re- 
quires a total but not necessarily injective mapping from 
nodes in VT; to those of VT2. In addition, a node u in VTi 
must be mapped to a node in VT that has either the same 
type or a supertype’ of u’s depending on if the matched 
node is a var node or a ret node. The element type of a 
node can be inferred from the XPath expression on its in- 
coming derivation edge. u can be mapped to a super-type 
ret node (u) because the associated bindings of ®(u) are 
all deeply returned (due to the semantics of a return expres- 
sion) to enable the retrieval of u’s bindings from subtrees of 
®(u)’s bindings in Qo’s result. 

C3: Preservation of ancestorships. In a minimal 
VarTree, nodes represent essential variables and the HMV Ds 
among them are captured by their ancestor-descendant rela- 
tionships. Therefore, if all the ancestor-descendant relation- 
ships in V7, have correspondence mappings in VTi, then 
it means that the to-be-utilized HMVDs required by Qı are 
all preserved by Q2 and also present in Q’s result. 


C4: Correspondence of construct types. This con- 
dition checks the correspondence between query construct 
types. A var node represents a for expression while a ret 
node denotes a return expression. The bindings of a ret 
node are definitely returned whereas those of a var node 
may be used for constructing new elements correspondingly. 
Therefore, a var node can be mapped to a ret node and still 
get the correct bindings, while a ret node cannot be mapped 
to a var node since the new elements in Qo’s result rather 
than the original bindings would be returned in doing so. 


We can see from the above conditions that the MAC map- 
ping ensures that all the essential variable bindings, the 
HMVDs among them, and their attached returns required 
by Qı are preserved in the result of Q2. 


5.1.2 MIC Mapping 


®Our technique allows a query to involve more than one 
XML document. In this case, the corresponding VarTree 
is actually a forest of trees, which may be connected by 
equality conditions on variables across trees. 

"Here the concept of subtype-supertype is not the same as 
those in the object-oriented modeling domain. Instead, it 
corresponds to the element inclusion hierarchy in the DTD. 
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In addition to the MAC mapping, we need to check if the 
binding set of each node in VT; is indeed a subset of that of 
its match in VT». This is guaranteed by the MIC mapping, 
which essentially checks XPath containment. 


DEFINITION 5.2. Let VT, and VT be the minimal VarTrees 
of Qi and Q2 respectively. Suppose ®(VT))=VT>2 according 
to the MAC mapping. In MIC mapping, tree homomor- 
phism is checked between the encapsulated navigation pat- 
terns for each pair of matched nodes. Two steps are carried 
out for each node u in VTi: 


1. If u g roots(VT,), concatenate the XPath expressions 
along the path from ®(parent(u)) to ®(u); 


2. Assume that the XPath expression on the derivation 
edge of u is pı and the one obtained from step (1) is p2. 
pı Cp2 is checked with C denoting XPath containment 
(i.e., there is a tree homomorphism from the pattern 
tree representation of p2 to that of pid). 


Note that if a pair of parent-child nodes (p,c) in VTi 
maps to a pair of ancestor-descendant nodes (a,d) in VT2 
by the MAC mapping, then p2 is the concatenated XPath 
expressions originated from a to d. This implies that, to 
make QıC Q2 hold, more essential variables may be speci- 
fied in Q2 than Q to preserve more HMVDs in Q2’s result. 
The MIC mapping makes sure that p2, the XPath expres- 
sion used for deriving d’s bindings from a’s, imposes a less 
restricted pattern constraint than pi, the XPath expression 
used for deriving c’s bindings from p’s. 


Figure 9: MAC Mapping between Minimal VarTrees 


EXAMPLE 5.1. Figure 9 illustrates two MAC mappings. 
One is between the two VarTrees of Qi and Q2 in Figure 1. 
®(VT2)AVT as shown on the left hand side. For one rea- 
son, the var node $b in VT2 has no match in VT, that sat- 
isfies C2. We can hence derive Q2ZQ1. 

The second mapping is between the two VarTrees of Q2 
and Qs in Figure 1. The right hand side of Figure 1 shows 
®(VT2)=VT3, i.e., P($ro2)=$rQ3, P($bo2)=$bQ3, ®($ta2)= 
Stos, ®($ag2)=$agq3, O($tQ2)=$tg3, and ®($a/last)=$aQ3 
($to2, $to3, and $ag3 are the ret nodes). The mapping 
&($a/last)=$aQ3 holds due to T($a/last) =last, T($aq3) = 
author, and last <: author. 

The MIC mapping between the navigation pattern trees 
encapsulated in the derivation edges of VT2 and those of 
VT; is also successful. For example, the pattern tree for the 
XPath expression “//book” in deriving $ba3 can be embedded 


8Our XQuery fragment allows XPath(//,*,{]), for which the 


complexity of containment is CoNP-complete. However, the 
XPath containment complexity is reduced to PTIME if only 
two out of the three features are included. We refer the 
readers to [13] for the details of XPath containment. 


into that for “//book[price=30]” in deriving $bo2. Note that 
the tree embedding direction for XPath containment pı C p2 
is from p2 to pr. 


5.2 TagTree-based Containment Mapping 


We now address the implications of nested block structure 
on the containment of XQuery. 

One intuitive example of such implications is the reliance 
of the return semantics on the emptiness of variable binding 
set(s). For example, note that since Q2 in Figure 1 specifies 
both $tg2 and $agz2 in the outer block, the construction of 
anew <pairQ2> element occurs only when a book element 
has both title and author subelements. In other words, if 
the binding set of $ag2 is empty for a specific $bg2 binding 
as for example for 61 and b3 in the source XML in Figure 2, 
then there will be no invocation of the return to construct 
the new elements. 

Contrary to this example, the construction of <pairQ3> 
elements for Qs in Figure 2 is solely based on the bindings of 
$bg3, irrelevant of the bindings $tg3 and $ag3. The reason 
lies in the nested block structure of Qs (i.e., Qs has two 
query blocks versus that Q2 has just one). While variables 
specified in Qs are the same as those in Q2, they however are 
placed in different blocks (i.e., $tg3 and $ag3 are specified 
and returned in the inner block while $bg3 is defined in the 
outer one) as oppose to being put in the same block as those 
in Q2. Consequently, Q2 E Q3. Similarly, we have Qs E Qa. 

Recall that the TagTree structure of a query conforms to 
the nested block (see its definition in Definition 4.2). The 
V and C in an outer block together compose the evaluation 
context for those in its descendant blocks. Also, variables in 
the same V affect each other in the sense that their cartesian 
product would produce no tuple if any variable member in 
V has an empty binding set. 


DEFINITION 5.3. Let TT be the TagTree of query Q and 
n=[V,C,R,T] be a block node n in it. Variables in V mu- 
tually depend on each other. Also, they all depend on those 
variables defined in n’s ancestor block nodes. We call such 
dependencies region dependencies and denote them by —. 

Intuitively, if there is a variable binding u œ v, then v can 
only be defined either in the same block or a descendant 
block of where u is defined, i.e., u — v. However, we cannot 
imply u >œ v from u —> v. This is formally stated as below. 


LEMMA 5.1. For any two variables u and v of Q, ifubv, 
then uc v. 


5.2.1 Block Mapping 


We now define a third mapping that complements the 
previously defined MAC and MIC mappings. 


DEFINITION 5.4. Let TT; and TT be the TagTrees of Qi 
and Q2 respectively. The block mapping is a one-to-many 
mapping function 0 from each block node n of TT) to nodes 
of TT2, denoted by 0(TT1)=TT»2, such that n=(V,C, R,T) 
in TT, and its image set S=0(n) in TT2 satisfy: 

C1’) for every variable u € V, ®(u) € U,V (mE S), 

C2') for any two variables w, x€ Un, Vi (mE S), ifwo 
x, then there must be u and v in TT), such that 
O(u)=w, O(v)=2, uv, and 

C3’) any ci € Un Ci (ni€ S) can be implied by a con- 
dition c € (CU Um, C”) (mi is an ancestor block 
node of n). 
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C1’: Containment of variables. This condition is ac- 
tually used for establishing the 6 mappings (i.e., finding for 
each block node n in TT; its image node set S) based on 
the VarTree node matches. Intuitively, a block node n; in 
TT} is included in S if any variable in it is the match of any 
variable u € V in n. V and U,, Vi denote variables in n and 
the union of those defined in n’s images ni;€ S' respectively. 


C2’: Implication of region dependencies. If a vari- 
able w in an image node n; of TTə is involved in a region 
dependency (e.g., wx), then C2’ ensures that there must 
be a region dependency between the corresponding variables 
in block nodes of TT. In other words, the region dependen- 
cies with matched variables in TTə involved must be a subset 
of those among the corresponding variables in TT}. 


C3’: Implication of where-conditions. Suppose ni 
in TT> is an image node of the node block n in TT). C3’ 
checks if every where-condition c; left in n; can be implied 
from a where-condition c either in n or an ancestor block of 
n (i.e, CE ier with m:€ ancestors(n)). 

In a nutshell, Cl’ ~ C3’ required by the block mapping 
make sure that Qı must assert more restricted constraints, 
i.e., region dependencies and where-conditions, than Q2 does. 


EXAMPLE 5.2. Suppose that two adjacently nested blocks 
nı and ng in TT, define variables u and v respectively. 
There is no other variable in nı or in ng. We also sup- 
pose that ®(u) = x and (v) = w, and that x and w are 
defined in the same block n in TT2. By condition Cl’, we 
have 0(n1) = {n} and 0(n2) = {n}. We derive x— w and 
w= zx due to the mutual region dependencies asserted by a 
block node. Also, from the enclosing relationship between nı 
and n2, we know that uv but vpu. C2’ is not satisfied 
based on these facts. Consequently, the block mapping fails 
and we derive QiZ Q2. 


Putting all three types of containment mappings together, 
we now have a sound (not generally complete) solution for 
XQuery containment in the presence of variable binding de- 
pendencies. 


THEOREM 1. Given two XQueries Qı and Q2, Qi E Q2 
if there exist a MAC mapping ®(VT;)=VT2, a MIC mapping 
plCp?2 (i.e., the encapsulated XPath containment) for every 
matched node pair, and a block mapping 0(TT;)=TT2. 


6. SYSTEM AND EVALUATION 


Based on our proposed containment mapping approach 
for XQuery, we have designed and implemented a semantic 
caching system called ACE-XQ [6, 7]. The ACE-XQ system 
is developed using Java 1.3. It utilizes the IPSI-XQ engine 
[14] installed at both the cache and the remote server sites to 
execute the rewritten query and the original query respec- 
tively. Source XML documents are hosted at the server. 

The first set of experiments is for validating our contain- 
ment mapping and rewriting methods for XQuery. For this, 
we designed some query workload? that includes queries that 
are similar to those W3C use cases [20] and are within the 
scope of our XQuery fragment. The experiment shows that 
the results produced by running a query with and without 


°We mainly focus on the “refining” case. Namely, the hit 
ratio of a new query being contained in a cached one is high. 
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Figure 10: Query Response Times for Different Doc- 
ument Sizes w/o Caching 


the attempt of conducting containment mapping and rewrit- 
ing it against a containing view result are the same. 

The second set of experiments is to evaluate the query 
performance with and without the semantic cache. As ex- 
pected, Figure 10 shows the improved query performance by 
up to 10 folds for the totally contained cases in our setting. 

Table 1 shows the break down of the query response time 
for a contained case into the computation overhead (i.e., 
query decomposition and minimization, containment map- 
ping, and rewriting) and the query evaluation time. We 
see that the overhead is considerably small compared to the 
query evaluation time. This implies that although the com- 
plexity of our XQuery containment approach is NP-complete 
in general (since all three mappings are tree homomorphism 
extensions with additional checking of equivalence condi- 
tions, of the inclusive relationships between element types, 
etc.), it is efficient and practical in many real scenarios. 


XML Decomp. & Cont. Query Query 
Size | Minimization | Mapping | Rewriting | Execution 


175KB 
890KB 
1800KB 


0.8ms 
0.8ms 
0.8ms 


8.8ms 
9.2ms 
9.1ms 


5.2ms 
5.4ms 
5.2ms 


173.6ms 
1068.8ms 
4525.4ms 


Table 1: Processing Time Decomposition 


Extensive experimental studies can be found in [8, 5]. 


7. CONCLUSION 


In this paper, we proposed a containment mapping ap- 
proach that handles the effects of variable binding depen- 
dencies and the nested block structure on XQuery contain- 
ment. Our approach provides sufficient conditions for solv- 
ing nested XQuery containment. 

An intermediate future work would be to incorporate the 
XQuery logical optimization technique in [9] in our normal- 
ization step to reduce the possible navigation redundancies 
in the VarTree representation. This helps to prune the space 
for conducting containment mapping. However, the lack of 
this optimization step as of now does not impact the sound- 
ness of the approach. 

The XQuery fragment defined in this paper provides a 
good scope for us to focus on a set of important XQuery 
features with respect to the containment problem. We plan 
to extend the proposed containment mapping approach to 
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accommodate a broader fragment of XQuery that includes 
disjunctions, aggregations, and other features as well as to 
consider more general constraints in XML and XQuery. 
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